The waiting time for m mutations 



by Jason Schweinsberg* 
University of California, San Diego 

August 8, 2008 



Abstract 

We consider a model of a population of fixed size N in which each individual gets replaced 
at rate one and each individual experiences a mutation at rate /i. We calculate the asymptotic 
distribution of the time that it takes before there is an individual in the population with m 
mutations. Several different behaviors are possible, depending on how /z changes with N . 
These results have applications to the problem of determining the waiting time for regulatory 
sequences to appear and to models of cancer development. 

1 Introduction 

It is widely accepted that many types of cancer arise as a result of not one but several mutations. 
For example, Moolgavkar and Luebeck [26] write that "the concept of multistage carcinogenesis 
is one of the central dogmas of cancer research", while Beerenwinkel et. al. [5] write that "the 
current view of cancer is that tumorigenesis is due to the accumulation of mutations in oncogenes, 
tumor suppressor genes, and genetic instability genes." The idea that several mutations are 
required for cancer goes back at least to 1951, when Muller [28J wrote, "There are, however, 
reasons for inferring that many or most cancerous growths would require a series of mutations in 
order for cells to depart sufficiently from the normal." Three years later, Armitage and Doll [2] 
proposed a simple mathematical multi-stage model of cancer. Motivated by the goal of explaining 
the power law relationship between age and incidence of cancer that had been observed by 
Fisher and Holloman [12] and Nordling [29], they formulated a model in which a cell that has 
already experienced k — 1 mutations experiences a kth mutation at rate u^. They showed that 
asymptotically as t — > 0, the probability that the mth mutation occurs in the time interval 
[t, t + dt] is given by 

r{t) dt = — dt. (1 

(m — 1)1 

They fit their model to data from 17 different types of cancer, and found that for many types 
of cancer the incidence rate r(t) increases like the fifth or sixth power of age, suggesting that 
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perhaps 6 or 7 mutations are involved in cancer progression. Because of concerns that having 6 or 
7 stages may not be biologially plausible, Armitage and Doll [3] later proposed a two-stage model 
as an alternative. A more general two-stage model was proposed by Moolgavkar and Knudson 
[24] . who demonstrated that two-stage models are flexible enough to fit a wide range of data if 
one allows for the possibilities that the number of healthy cells with no mutations may change 
over time, and that cells with one mutation may divide rapidly, causing the second mutation, 
and therefore the onset of cancer, to happen more quickly than it otherwise would. 

Since the seminal papers of Armitage and Doll, multi-stage models have been applied to a 
number of different types of cancer. Knudson [I9j [15] discovered that retinoblastoma is a result 
of getting two mutations. Multi-stage models of colon cancer have been studied extensively. 
Moolgavkar and Luebeck [26] argued that a three-stage model fit the available data slightly better 
than a two-stage model. Later in [22], they found a good fit to a four-stage model. Calabrese 
et. al. [6] worked with data from 1022 cancers from 9 hospitals in Finland and estimated that 
between 4 and 9 mutations are required for cancer, with fewer mutations being required for 
hereditary cancers than for sporadic (nonhereditary) cancers. A recent study [32] of over 13,000 
genes from breast and colon cancers suggests that as many as 14 mutations may be involved 
in colon cancer and as many as 20 may be involved in breast cancer. Multi-stage models have 
also been fit to data on lung cancer [13] and T-cell leukemia [31]. See [20J for a recent survey of 
applications of multi-stage cancer models. 

In this paper, we formulate a simple mathematical model and calculate the asymptotic dis- 
tribution of the time that it takes for cancer to develop. Our model is as follows. Consider a 
population of fixed size N. We think of the individuals in the population as representing N cells, 
which could develop cancer. We assume that the population evolves according to the Moran 
model |27] . That is, each individual independently lives for an exponentially distributed amount 
of time with mean one, and then is replaced by a new individual whose parent is chosen at random 
from the TV" individuals in the population (including the one being replaced). These births and 
deaths represent cell division and cell death. We also assume that each individual independently 
experiences mutations at times of a rate [i Poisson process, and each new individual born has 
the same number of mutations as its parent. We refer to an individual that has j mutations as 
a type j individual, and a mutation that takes an individual's number of mutations from j — 1 
to j as a type j mutation. Let Xj(t) be the number of type j individuals at time t. For each 
positive integer m, let r m = inf{t : X m {t) > 0} be the first time at which there is an individual in 
the population with m mutations. We view r m as representing the time that it takes for cancer 
to develop. Clearly t\ has the exponential distribution with rate Nfi because the N individuals 
are each experiencing mutations at rate [i. Our goal in this paper is to compute the asymptotic 
distribution of r m for m > 2. 

When a new mutation occurs, eventually either all individuals having the mutation die, caus- 
ing the mutation to disappear from the population, or the mutation spreads to all individuals in 
the population, an event which we call fixation. Because a mutation initially appears on only one 
individual and is assumed to offer no selective advantage or disadvantage, each mutation fixates 
with probability 1/N. Once one mutation fixates, the problem reduces to waiting for m — 1 ad- 
ditional mutations. However, it is possible for one individual to accumulate m mutations before 
any mutation fixates in the population, an event which is sometimes called stochastic tunneling 
(see [17]). It is also possible for there to be j fixations, and then for one individual to get m — j 
mutations that do not fixate. Because there are different ways to get m mutations, the limiting 
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behavior is surprisingly complex, as the form of the limiting distribution of r m depends on how 
/i varies as a function of N. 

There is another source of biological motivation for this model coming from the evolution 
of regulatory sequences. Regulatory sequences are short DNA sequences that control how genes 
are expressed. Getting a particular regulatory sequence would require several mutations, so to 
understand the role that regulatory sequences play in evolution, one needs to understand how 
long it takes before these mutations occur. See Durrett and Schmidt [El [9] for work in this 
direction. 

In addition to this motivation from biology, there is mathematical motivation for studying 
this model as well. The model is simple and natural and, as will be seen from the results, gives 
rise to different asymptotic behavior depending on how \i scales as a function of N. In particular, 
the usual diffusion scaling from population genetics in which N/j, tends to a constant is just one 
of several regimes. 

This paper can be viewed as a sequel to |10j . in which the authors considered a more general 
model in which an individual with k — 1 mutations experiences a kth mutation at rate u^. The 
model considered here is the special case in which ut = fJ- for all k, so we are assuming that all 
mutation rates are the same. However, whereas in |10j results were obtained only for specific 
ranges of the mutation rates Uk, here we are able to obtain all possible limiting behaviors for 
the case in which the mutation rates are the same. We also emphasize that although our model 
accounts for cell division and cell death, we assume that the rates of cell division and cell death 
are the same, unlike many models in the biology literature which specify that individuals with 
between 1 and m — 1 mutations have a selective advantage, allowing their numbers to increase 
rapidly (see, for example, O [Ml (HE EH [5]). As we explain below, several special cases of our 
results have previously appeared in the biology literature, especially for the two-stage models 
when m = 2. However, here we are able to give complete asymptotic results for all m, as well as 
to provide rigorous proofs of the results. We state our main results in section 2. Proofs are given 
in sections 3, 4, and 5. 

2 Main results 

In this section, we state our results on the limiting behavior of the waiting time for an individual 
to acquire m mutations, and we explain the heuristics behind the results. Many of the heuristics 
are based on approximation by branching processes. In the Moran model, if k individuals have 
a mutation, then the number of individuals with the mutation is decreasing by one at rate 
k(N — k)/N (because the k individuals with the mutation are dying at rate k, and the probability 
that the replacement individual does not have a mutation is (N — k)/N) and is increasing by one 
at rate k(N — k)/N (because the N — k individuals without a mutation are dying at rate one, and 
the replacement individual has a mutation with probability k/N). Therefore, when k is much 
smaller than N, the number of individuals with a given mutation behaves approximately like a 
continuous-time branching process in which each individual gives birth and dies at rate one. 

To keep track of further mutations, it is natural to consider a continuous-time multitype 
branching process in which initially there is a single type 1 individual, each individual gives birth 
and dies at rate 1, and a type j individual mutates to type j + 1 at rate fj,. If pj denotes the 
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probability that there is eventually a type j individual in the population, then 



To see this result, condition on the first event. With probability 1/(2 + /i), the first event is a 
death, and there is no chance of getting a type j individual. With probability 1/(2 + the first 
event is a birth, in which case each individual has a type j descendant with probability pj and 
therefore the probability that at least one has a type j descendant is 2pj — p 2 . With probability 
A*/(2 + the first event is a mutation to type 2, in which case the probability of a type j 
descendant is Pj-i because j — 1 further mutations are needed. Equation ([2]) can be rewritten as 
P 2 j + MPj ~~ MPj'-l = 0) an d the positive solution is 



Pi = 2 • 

When [i is small, the second term under the square root dominates the numerator, and we get 
Pj f« ^/fxpj-i. Since p\ = 1, the approximation pj at fi 1 " 2 <J 1 follows by induction. 

Because the Moran model can be approximated by a branching process when the number of 
mutant individuals is much smaller than N, this result suggests that under appropriate conditions, 
the probability that a type 1 individual in the population has a type m descendant should be 
approximately fi 1-2 ( " 1> . Proposition [1] below, which is a special case of Proposition 4.1 in |10| . 
establishes that this approximation is indeed valid. Here and throughout the paper, the mutation 
rate \i depends on N even though we do not record this dependence in the notation. Also, if / 
and g are two functions of N, we write f(N) ~ g(N) if f(N)/g(N) — > 1 as N — > oo. We also 
write f(N) <C g(N) if f(N)/g(N) -> as N -» oo and f(N) > 5 (iV) if f(N)/g(N) -» oo as 
iV ^ oo. 

Proposition 1. Consider a model which is identical to the model described in the introduction, 
except that initially there is one individual of type 1 and N—l individuals of type 0, and no further 
type 1 mutations are possible. Let q m be the probability that a type m individual eventually is 
born. Suppose that Nfi 1 ^ 2 <m x) — > oo as N — > oo, and £/ja£ i/iere is a constant a > s«c/t t/iai 
A a /i^0. T/ien 

Note that g m is the probability that a given type 1 individual eventually has a type to 
descendant. Because a number of our arguments involve considering each type 1 mutation and 
its descendants separately from other type 1 mutations, this result will be used repeatedly. 

To understand the order of magnitude of q m another way, recall that the probability that 
the total progeny of a critical branching process exceeds M is of order M _1//2 (see, for example, 
|14j). so if there are L independent branching processes, the most successful will have a total 
progeny of order L 2 . Furthermore, the sum of the total progenies of the L processes will also be 
of order 1? . Therefore, if there are L type 1 mutations, the number of descendants they produce 
will be of order L 2 . Each type 1 descendant will experience a type 2 mutation before dying with 
probability approximately /j, so this should lead to on the order of L 2 [i type 2 mutations. It 
follows that the number of type 2 descendants should be on the order of L^fj, 2 , and this will lead 
to on the order of L /x 3 type 3 mutations. Repeating this reasoning, we see that the number of 
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type m mutations should be of order L 2 " 1 1 fi 2 ™ -1 . By setting this expression equal to one and 
solving for L, we see that it should take on the order of /u~( 1-2 m ) type 1 mutations before one 
of these mutations gets a type m descendant. That is, the probability that a type 1 individual 
has a type m descendant is of order /i 1-2 <m 11 . 

2.1 Gamma limits when Nfi — > 

Because mutations occur at times of a Poisson process of rate Nfj, there will be approximately 
NjxT mutations by time T. We have seen that after a mutation occurs, the number of individuals 
with the mutation behaves approximately like a critical branching process. By a famous result 
of Kolmogorov [21], the probability that a critical branching process survives for time t is of 
order 1/t. This means that if we have NfjT independent critical branching processes, the most 
successful will survive for a time which is of order N fiT. Therefore, all mutations that appear 
before time T should either die out or fixate after being in the population for a time of order 
NfjT. If Nfj <C 1, then this time is much smaller than the time T that we have to wait for the 
mutation. Therefore, when Nfj <C 1, we can consider each mutation separately and determine 
whether either it fixates or gives birth to a type m descendant without fixating. We can ignore 
the time that elapses between when the original mutation appears, and when either it fixates 
or the descendant with m mutations is born. The importance of the condition Nfj <C 1 was 
previously noted, for example, in [17] and [T5] . 

We have already seen that a mutation fixates with probability 1/N and gives birth to a type j 
descendant with probability approximately fi 1-2 (J 1 . Therefore, fixation of some mutation will 
happen first if Nfj}~ 2 J — > as N — > oo or, equivalently, if> < N~ 2J Vt 23 This leads 

to the following result when Nfj, <C 1. Note that when m = 2, the result in part 1 of the theorem 
matches (12.12) of [30], while the result in part 3 matches (12.14) of |30j; see also section 3 of 
[T8] , section 4 of [IB], and Theorem 1 of [9]. 

Theorem 2. Let Z\, Z%, ■ ■ ■ be independent random variables having the exponential distribution 
with rate 1, and let Sk = Z\ + • • • + Z^, which has a gamma distribution with parameters (k, 1). 

1. If /i < iV~ 2 , then fj,T m S m -i. 

2. If jv-a*- 1 /(a , - 1 -i) < n < N -v/(v-i) f Qr some j = 2, . . . , m - 1, then fiT m -^ d S m -j- 

3. //iV - 2ro - 1 /(2- 1 -l) « ^ « N -l ; then Nfx 2-2-^) Tm ^ Zi 

To understand this result, note that in part 1 of the theorem, when 11 <C A^ -2 , fixation occurs 
before any individual gets two mutations without a fixation. Therefore, to get m mutations, we 
have to wait for m — 1 different mutations to fixate, and this is the sum of m — 1 independent 
exponential waiting times. The exponential random variables have rate parameter fj,, because 
there are mutations at rate N jj, and each fixates with probability 1/N , so mutations that fixate 
occur at rate fi. Once m — 1 fixations have occurred, the mth mutation occurs quickly, at rate 
Nfj, rather than at rate ft, so only the waiting times for the m — 1 fixations contribute to the 
limiting distribution. For part 2 of the theorem, when N~ 2J ^ 2J ~^ <C fi <C N~ 23 ^ 2J ^ for 
some j = 2, . . . , m — 1, fixation occurs before an individual can accumulate j + 1 mutations, but 
an individual can accumulate j mutations before fixation. Therefore, we wait for m — j fixations, 
and then the remaining j mutations happen without fixation. Because the j mutations without 
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fixation happen on a faster time scale, the limit is a sum of m — j exponential random variables. 
In part 3, we get m mutations before the first fixation, and there is an exponential waiting time 
until the first mutation that is successful enough to produce an offspring with m mutations. 
Mutations happen at rate N/j,, and mutations are successful with probability approximately 
^1—2 ( m 1 ^ w j uc ] 1 explains the time-scaling factor of Nfi 2 ~ 2 (m 1 . 

Part 3 of Theorem [2] is the special case of Theorem 2 of [10J in which Uj = [i for all j. 
Condition (i) of that theorem becomes the condition fi <C A r_1 , while condition (iv) becomes the 
condition N~ 2m f^ 2 ™ <C fj,. Parts 1 and 2 of Theorem [2] above are proved in section [3J 

2.2 The borderline cases 

Theorem [2] does not cover the cases when \i is of the order N~ 23 V(a* for some j. On this 
time scale, for the reasons discussed in the previous section, we can still neglect the time between 
when a mutation first appears in the population and when it either fixates or dies out because 
this time will be much shorter than the time we had to wait for the mutation to occur. However, 
fixations happen on the same time scale as events in which an individual gets j mutations without 
fixation. Therefore, to get to m mutations, we start with m—j fixations. Then we can either have 
another fixation (followed by j — 1 additional mutations, which happen on a faster time scale) or 
we can get j mutations without any fixation. The waiting time is the sum of m — j independent 
exponential random variables with rate \x and another exponential random variable having the 
faster rate Xjfi. The last exponential random variable comes from waiting for a mutation that 
either fixates or has a descendant with j — 1 additional mutations but does not fixate. This leads 
to the following result. 

Theorem 3. Suppose /j, ~ AN~ 2j 1 ^ 2j for some j = 2, . . . ,m and some constant A > 0. 
Let Zi,Z2,-.. be independent exponential random variables having the exponential distribution 

with rate 1, and let Sk = Z\ H + Z\~. Let Y be independent of Z±, Z2, ■ ■ ■ , and assume that Y 

has the exponential distribution with rate Xj, where 

00 ^2fe(l-2-(J- 1 )) / 00 ^2fc(l-2-<J- 1 )) 

Then jiT m — >d S m —j + Y. 

This result when j = m is the special case of Theorem 3 of |10| in which Uj = for all j. As 
will be seen in section [3l the result for j < m — 1 follows easily from the result when j = m. 

To explain where the formula for Xj comes from, we review here the outline of the proof of 
Theorem 3 in [10j. Assume that we already have m — j fixations, and now we need to wait either 
for another fixation or for a mutation that will have a descendant with j — 1 additional mutations. 
We can not approximate the probability of the latter event by fi 1 " 2 (J X> in this case because 
to get j — 1 further mutations, the number of individuals with the original mutation will need 
to be of order N, so the branching process approximation does not hold. Instead, we consider 
a model in which there is one individual with a mutation at time zero, and X{t) denotes the 
number of individuals with the mutation at time t. At time t, the individuals with the mutation 
each experience further mutations at rate fj,, and these further mutations each have probability 
approximately /U 1-2 (i 2) of having an offspring with j total mutations. Therefore, at time t, 
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successful mutations are happening at rate ^yX(t), where 



_ 1-2-0- 2 ) 2(l-2-(J- 1 )) 

At time t, the jump rate of the process is 2X(t)(N — X(t))/N. Therefore, by making a time- 
change, we can work instead with a continuous-time simple random walk (Y(t),t > 0) which 
jumps at rate one, and the mutation rate at time t becomes 



7 Y(t) 



N 



1 



2Y{t)(N -Y(t)) 2(1 -Y(t)/N)' 



Therefore, the probability that there is no fixation and no further successful mutation is approx- 
imately 



E 



cxp 



7 



dt 1 



L{y(T)=0} 



2(1 - Y(t)/N) 

where T = inf{t : Y(t) £ {0,iV}}. Simple random walk converges to Brownian motion, so 
if instead of starting with just one mutant individual we assume that 1^(0) = [Nx\ > where 
< x < 1, then the above expression is approximately 

-2-0-1)) 

//! rj — E exp ( / — ; ds )1 



l-B(s) 



L {B(U)=0} 



(4) 



where U = inf{t : B(t) £ {0,1}} and (B(t),t > 0) is Brownian motion started at x. Here we 
are also using that N 7 ~ 

^2(1-2 O i))^ where factor of ^2 

comes from the time change in 

replacing random walk with Brownian motion. Since the probability that we get either fixation 
or a successful mutation is 1 — u(x), and we need to take a limit as the number of mutants at 
time zero gets small, we have 

1 — u(x) 



A, 



lim 

x^0 



Thus, the problem reduces to evaluating the Brownian functional ([!]). One can obtain a differ- 
ential equation for u(x) using the Feynman-Kac formula, and then get a series solution to the 
differential equation, from which the formula ([3|) follows. Details of this argument occupy section 
6 of [10]. 



2.3 Rapid mutations 

It remains to handle the case when N/i 0. With this scaling, fixation will not occur before time 
T m . However, the waiting time between the type 1 mutation that will eventually produce a type 
m descendant and the actual appearance of the type m descendant can no longer be ignored. As 
a result, waiting times are no longer sums of exponential random variables. Instead, we obtain 
the following result. The m = 2 case of part 3 is equivalent to the special case of Theorem 1 in 
[TU] when u\ = U2 = /x. 

Theorem 4. We have the following limiting results when Nli 0. 
1. Ifn > AT- 2 /™, then 

( t m \ 

lim P(r m > N'^Li-H) = exp J . 

N^oo \ ml J 
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2. If JV-V(l+(m-i-2)2 U+ < M < 3 ) f r some j = 1, . . . ,m - 2, tfiera 



lim P(r m > AT-l/C—^^-l-d^-^/M^ 



■ - ■ m - r "j exp . 

Af^cxD y (m — j)\ 

3. If fj, ~ A2V~ 1 ' ( 1+ ( m— 1 ' 2 J ) /or some j = 1, . . . , m — 1 and some constant A > 0, i/ien 

/ 4l+(m-j-l)2-J /■* i _ -2s \ 

lim P(r m > pT^ 2 3 k) = exp - : — / (t - s)" 1 ^" 1 - =- ds . 

n^oo ym ' y \ ( m -j-l)\ J K ' l + e~ 2s J 

We now explain the intuition behind these results. Recall that Xj(t) is the number of indi- 
viduals with j mutations at time t. Because there are N individuals getting mutations at rate 
we have i£[Xi(i)] ~ N fit for small t. Each of these individuals acquires a second mutation at 
rate /j,, so 

rt Nfi 2 t 2 



E[X 2 (t)] « pi [ Nfisds 
Jo 



Repeating this reasoning, we get E[Xj(t)] w NfiH 3 / j\. 

When the mutation rate is sufficiently large, there is a Law of Large Numbers, and the 
fluctuations in the number of individuals with j mutations are small relative to E[Xj(t)]. In 
this case, Xj(t) is well approximated by its expectation. When the mutation rate is sufficiently 
small, most of the time there are no individuals with j mutations in the population, and when 
an individual gets a jth mutation, this mutation either dies out or, with probability q m -j+i, 
produces a type m descendant on a time scale much faster than r m . In this case, the problem 
reduces to determining how long we have to wait for a jth mutation that is successful enough to 
produce a type m descendant. There is also a borderline case in which we get stochastic effects 
in the limit both from the number of type j individuals in the population and from the time 
between the appearance of a type j individual that will eventually have a type m descendant and 
the birth of the type m descendant. 

If the mutation rate is fast enough so that X m _x(i) ~ E[X m -i(t)] up to time r m , then since 
each individual with m — 1 mutations gets an mth mutation at rate [a, we get 

P(T m >t)*e W ^-vJ o (m _ 1} , dsj=exp^-— — | (5) 

This leads to the result in part 1 of Theorem |4] if we substitute N~ 1 / m fi~ 1 t in place of t in ([5]). 
In this regime, mutations are happening fast enough that births and deaths do not affect the 
limiting result, and we get the same result that we would get if r m were simply the first time 
that one of TV" independent rate \x Poisson processes reaches the value m. Consequently, as can 
be seen by integrating ([T]), this result agrees with the result of Armitage and Doll [2], who did 
not consider cell division and cell death in their original model. The result when m = 2 agrees 
with a result in section 4 of [16], and with (12.18) of [30 j . 

Next, suppose mutation rates are fast enough so that X m _j_i(t) ?s E[X m -j-i(t)] up to time 
r m , but slow enough that the time between the appearance of a "successful" type m—j individual 
that will have a type m descendant and the birth of the type m descendant is small relative to r m . 
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Then each type m — j — 1 individual experiences "successful" mutations at rate l^qj+i ~ /i 2 2 J 
by Proposition [Q so 

/ /-tjy m-j-i m-j-i \ / N m -i+l-Z-i t m-3 

P{r m > t) « exp - /i 2 - 2 / —9 : — — ds = exp ' 



o 



(m — j — 1)! / \ (m — j)! 



This leads to the result in part 2 of Theorem |U The borderline cases are handled by part 3 of 
Theorem HI 

To understand where the boundaries between the different types of behavior occur, first 
recall that the number of type k individuals born by time t is of the order Nn k t k . Because each 
individual gives birth and dies at approximately rate one, the number of births and deaths of 
type k individuals by time t is of order Nfj, k t k+1 . Because the standard deviation of the position 
of a random walk after M steps is of order M 1//2 , the standard deviation of the number of type k 
individuals by time t is of order N 1 / 2 ^ fc / 2 £( fc+1 )/ 2 . Therefore, we have Xk(t) ~ E[Xk(t)\ whenever 
N i/2^k/2 t (k+i)/2 ^ J\[^ k t k or, equivalently, whenever 1 <C N^t*' 1 . See Proposition [TT1 below 
for a precise statement of this result. 

Each type k individual experiences a mutation that will have a type m descendant at rate 
[iq m -k ~ fi? 2 ( ™ k V ' '■ Therefore, the expected number of such mutations by time t is of the 
order N/j k t k ■ /x 2-2 ™ k 1J -t = N/j k+2 ~ 2 <m * 1 t*+ 1 . This expression is of order one when t is of 
order A^ _1// ( fc+1 )/i _1_ ^ 1 ~ 2 (m k 1) )/( fc+1 ), which is consequently the order of magnitude of the time 
we have to wait for one such mutation to occur. It now follows from the result of the previous 
paragraph that X^^t) ~ E[Xf~(t)] up to time r m whenever 

1 « Ar/^i/^+D^-i-d- 2 -'" 1 ^- 1 ))/^))^!. ( 6 ) 

The expression on the right-hand side of ([6]) can be simplified to (iV 2 /i 2+ ( fc-1 ) 2 (m k 
so © is equivalent to the condition 

/ , >>A r-i/d+(fc-i) 2 - (m - fc) ). (7) 

This condition can be compared to the condition for part 2 of Theorem [H which entails that (|T|) 
holds for k = m — j — 1 but not for k = m — j, and therefore the number of type m — j — 1 
individuals, but not the number of type m—j individuals, is approximately deterministic through 
time r m . 

If instead /i is of the order jV -1 /( 1+ ( m ~- J ~ 1 ) 2 3 ) for some j = 1, . . . , m — 1, then on the relevant 
time scale the number of individuals of type m — j — 1 behaves deterministically, but the number 
of individuals of type m — j has fluctuations of the same order as the expected value. As a 
result, there are stochastic effects from the number of type m — j individuals in the population. 
In this case, there are also stochastic effects from the time between the birth of type m — j 
individual that will have a type m descendant and the time that the type m descendant is born. 
Calculating the form of the limiting distribution in these borderline cases involves working with 
a two-type branching process. This branching process is very similar to a process analyzed in 
chapter 3 of [33], which explains the resemblance between part 3 of Theorem U] and (3.20) of 
[33 j . Similar analysis using generating functions of branching processes that arise in multi-stage 
models of cancer has been carried out in |23 |. I25 [ [26]. The work in [25J allows for time-dependent 
parameters, while a three-stage model is analyzed in |26j. 
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2.4 The case m = 3 

To help the reader understand the different limiting behaviors, we summarize here the results 
when m = 3. There are 9 different limiting regimes in this case; in general for the waiting time 
to get m mutations, there are Am — 3 limiting regimes. Below Z\ and Zi have the exponential 
distribution with mean one, and Y\ and Yi have the exponential distributions with mean A2 and 
A3 respectively, where A2 and A3 are given by (J3J) - The random variables Z\, Z2, Yi, and Y2 are 
assumed to be independent. 

• If (j, <C N~ 2 , then by part 1 of Theorem [2J fir 3 — Z\ + Zi- We wait for two fixations, and 
then the third mutation happens quickly. 

• If fj, ~ AN~ 2 , then by the j = 2 case of Theorem [3l /XT3 — >d Z\ + Y\. We wait for 
one fixation, then either a second fixation (after which the third mutation would happen 
quickly) or a second mutation that will not fixate but will have a descendant that gets a 
third mutation. 

• If A^ 2 < fi < A^ 4 / 3 , then by the j = 2 case of part 2 of Theorem El /xt 3 -> d Z\. We wait 
for one fixation, and then the other two mutations happen quickly. 

• If /x ~ y4A r_4 / 3 , then by the j = 3 case of Theorem [33 /jt 3 — Y2. We wait either for a 
fixation (after which the other two mutations would happen quickly) or a mutation that 
will not fixate but will have a descendant with two additional mutations. 

• If A^ 4 / 3 <C /i <C xV _1 , then by part 3 of Theorem [21 NyU^T^ — >d Z\. Fixation does not 
happen before time T3, but we wait an exponentially distributed time for a mutation that 
is successful enough to have a descendant with three mutations. 

• If /x ~ AN' 1 , then by the j = 2 case of part 3 of Theorem [H 

P(^/ 4 T3 > t) - exp ( - A jf . 



If N" 1 < /i < A^ 2 / 3 , then by the j = 1 case of part 2 of Theorem H P(N 1 / 2 fi 5 ^r 3 > ^ 
exp(— t 2 /2). The number of individuals with one mutation is approximately deterministic, 
and the stochastic effect comes from waiting for a second mutation that is successful enough 
to have a descendant with a third mutation. 

If /i ~ ylA^ 2 / 3 , then by the j = 1 case of part 3 of Theorem HI 

P(^ 2 r 3 > t) -» exp ( - A 3 / 2 jf*(t - da) . 



• If it > N~ 2/3 , then by part 1 of Theorem 0} P(N 1/3 fir 3 > t) — > exp(-t 3 /6). The number 
of individuals with two mutations is approximately deterministic, and the stochastic effect 
comes from waiting for the third mutation. 
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2.5 Power law asymptotics and implications for cancer modeling 

Because the probability that an individual develops a particular type of cancer during his or her 
lifetime is small, it seems unlikely that it will be possible to observe the full limiting distribution 
of the waiting time for cancer from data on cancer incidence. Instead, we will observe only the 
left tail of this distribution. Consequently, what is likely to be most relevant for applications 
are asymptotic formulas as t — > 0. Throughout this subsection, write f(t) ~ g(t) to mean that 
f(t)/g(t) — ► 1 as t — > 0. Recall that if Sj is the sum of j independent exponential random 
variables with mean one, then P(Sj < t) ~ t 3 This fact, combined with the approximation 
1 — exp(— t m ~ 3 j (m— j)l) ~ t m ~ 3 j (m—j)\, allows us to deduce the following corollary of Theorems 
M and [3 

Corollary 5. We have the following asymptotic formulas as t — > 0: 
1. If n < N~ 2 , then 

+m—l 

lim P(r m < fi t) 



n-^o ' (m-1)! 

2. //Ar-2 J ~ 1 /(2-'- 1 -i) <^n<^ 7v-2V(2 3 -i) f or some j = 2, . . . ,m - 1, then 

t m-j 



P(r m < n~H 



(m-j)V 



3. // iV-2'"- 1 /^- 1 -!) ^ N -l ) then 

P(r m <iV-V- 2+2 " M t)«t. 
I // Ar-V(i+(™-i-2)2-( J + 1 >) ^ ^ ^ ^-l/(l+(m-i-l)2-^) j or some j = 1, . . . , m _ 2, t/ien 



lim P(r m < AT- 1 /^-i) /x -l-(l-2 i)/("»-j)t) 



^^oo (m — jj! 

5. ///u > AT" 2 /™, t/ien 

lim P(r m < iY- 1 ^^- 1 *) « — . 
iV-+oo m! 

By integrating ([1]), we see that the result in part 5 of the corollary, which says that the 
probability of getting cancer by time t behaves like Ct m , agrees with the result of Armitage and 
Doll. However, parts 1 through 4 of the corollary show that in an m-stage model of cancer, the 
probability of getting cancer by time t could behave like Ct J for any j = 1, 2, . . . , m, depending 
on the relationship between \x and N. This range of behavior can occur because not all of the 
m events required for cancer are necessarily "rate limiting". For example, when part 2 of the 
corollary applies, there are m—j fixations, and then the remaining j mutations happen on a much 
faster time scale. Consequently, it is not possible to deduce the number of mutations required 
for cancer just from the power law relationship between age and cancer incidence. 

Corollary [5] also shows that in our m-stage model, the probability of getting cancer by time 
t will never behave like CP for j > m. However, as noted by Armitage and Doll (see [HE]), 
higher powers could arise if the mutation rate, instead of being constant, increases over time like 
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a power of t. Also, the probability of getting cancer by time t could increase more rapidly than 
t m if cells with mutations have a selective advantage over other cells, allowing their number to 
increase more rapidly than our model predicts. This explains, in part, the success of two-stage 
models in fitting a wide variety of cancer incidence data, as documented in [23] . 



3 Proof of Theorems H and [3] 

Recall that part 3 of Theorem [2] is a special case of Theorem 2 of [10], so we need to prove only 
parts 1 and 2. We begin by recording three lemmas. Lemma El which just restates (3.6), (3.8), 
and Lemma 3.1 of [10], bounds the amount of time that a mutation is in the population before 
it dies out or fixates. Lemma [7] complements Proposition [TJ Lemma [5] is a direct consequence of 
part 3 of Theorem[2l In these lemmas and throughout the rest of the paper, C denotes a positive 
constant not depending on N whose value may change from line to line. 

Lemma 6. Consider a model of a population of size N in which all individuals are either type 
or type 1. The population starts with just one type 1 individual and evolves according to the 
Moran model, so each individual dies at rate one and then gets replaced by a randomly chosen 
individual from the population. Let X(t) be the number of type 1 individuals at time t. Let 
T = inf{t : X(t) G {0, N}}. Let L^ be the Lebesgue measure of {t : X(t) = k}. Then for 
k = l,...,N—l, 

E[Lk] = r- (8) 



Also, 

and for all < t < N , 



E[T] <C log N (9) 



P(T >t)< C/t. (10) 

Lemma 7. Consider the model of Proposition^ Let q' m be the probability that a type m individual 
is born at some time, but that eventually all individuals have type zero. Suppose N p}~ 2 m 1} — > 
as N — > oo. Then 

q' m « l/N. 

Proof. The event that all individuals eventually have type zero has probability (N — 1)/N re- 
gardless of the mutation rate. On this event, reducing the mutation rate can only reduce the 
probability of eventually getting a type m individual. Therefore, it suffices to prove the result 
when 

Nfx L z — > oo. (11) 

If a type m individual eventually is born, then some type 2 mutation must have a type m 
descendant. By (JHJ), for k = 1, . . . ,N — 1, the expected amount of time for which there are k 
individuals of nonzero type is 1/k. While there are k individuals of nonzero type, type 2 mutations 
occur at rate at most k\i. On the event that there is no fixation, the number of individuals of 
nonzero type never reaches N, and the expected number of type 2 mutations while there are 
fewer than N individuals of nonzero type is at most 

N-l l 

^-•kn<N». 

fe=i 
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When (jlip holds, we can apply Proposition [T] to see that if m > 3 then each type 2 mutation has 
probability at most C/i 1 ' 2 <m 2) of having a type m descendant. This inequality holds trivially 
if m = 2. It follows that 

q U<(N^)(C^- (rn - 2) ) = CN^- (m - 2) , 
and therefore Nq' m < C(iV^ 1_2_(m_1) ) 2 -> 0, as claimed. □ 
Lemma 8. Suppose j > 2. // AT- 2 ' -1 /^ 1 -!) < ^ < i/jy, then for all e>0, 

lim Pfo < e/x- 1 ) = 1. 

N^oo 

Proof. Part 3 of Theorem [2] gives limjv->oo P(Nfj l 2 ~ 2 ~ U ~ 1) r j < t) = 1 - e~* for all i > 0. The 
result follows immediately because /i <C Nfi 2 ~ 2 <j ^ by assumption. □ 

Proof of parts 1 and 2 of Theorem^ Suppose either j = 1 and /i -C A^ 2 , or j = 2, . . . , m — 1 
and N~ 23 /( 2J ~ 1 ) <C \i <C N -2 '^ 23-1 '. Let 7$ be the time of the ith mutation, so the points 
(7i)iSi form a rate A 7 // Poisson process on [0, 00). Call the ith mutation bad if at time 7$, there 
is another mutation in the population that has not yet died out or fixated. Otherwise, call the 
mutation good. For all i, let £j = 1 if the ith mutation fixates, and let = otherwise. We have 
P(£i = 1) = 1/N for all i, but the random variables (Ci)i^i are not independent because if two 
mutations are present at the same time on different individuals, at most one of the mutations 
can fixate. 

Let be a sequence of i.i.d. random variables, independent of the population process, 

such that P(£j = 1) = 1/N and P(£j = 0) = (N — 1)/N for all i. Define another sequence 
such that £| = & if the ith mutation is good and ^ = if the ith mutation is bad. 
If the ith mutation is good, then P(£j = l|(Cfc)l=i) = LA/V, so (CO^i i s an i-i>d. sequence. 
Let o"i = inf{7i : ^ = 1} and for k > 2, let <7fc = inf{7j > cifc_i : & = 1}. Likewise, let 
cx^ = inf{7i : ^ = 1} and for k > 2, let o^, = inf{7j > at-i ■ C'i = !}• The points ji for which 
£| = 1 form a Poisson process of rate fi, so \io' m _^ has the gamma distribution with parameters 
(m- j,l). 

Let e > 0, and choose t large enough that 

P{a' m _ j > fi-H) < e. (12) 

Note that because [io J rn _^ has a gamma distribution for all N, here i does not depend on N. 
The expected number of mutations by time \x~ x t is (A r /u)(^ _1 t) = Nt. After a mutation occurs, 
the number of individuals descended from this mutant individual evolves in the same way as the 
number of type 1 individuals in Lemma [6l Therefore, by Q, the expected amount of time, before 
time that there is a mutation in the population that has not yet disappeared or fixated is 

at most C(N log N)t. Therefore, the expected number of bad mutations before time fi~ 1 t is at 
most (Nfi)(C(N log N)t) = C (N 2 log N) fit. If a bad mutation occurs at time 7$, the probability 
that either £j or ^ equals one is at most 2/N, so 

P(& = £ for all % such that j { < fi'H) > 1 - 2C(N log N)fit. 

Because \i -C 1/ (iV log A") , it follows by letting e — > that 

lim P(<r' • = cr m _j) = 1. (13) 
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Thus, [MJm-j S m -.j. To complete the proof, it remains to show that 

M( T m - <r m -j) -> p 0. (14) 

We first prove that 

lim P(r m < a m -j) = (15) 

If r m < a m -j, then before time cr m -j, there must be a type k mutation for some k < m — j 
that does not fixate but has a type m descendant. We will bound the probability of this event. 
Recall that the expected number of mutations before time [i~ 1 t is Nt. Because \i <C N~ 2J A 23-1 ), 
we can apply Lemma [7] with j + 1 in place of m to get that the probability that a type m — j 
mutation does not fixate but has a type m descendant is asymptotically much smaller than 1/N. 
Thus, the probability that before time i-i~ 1 t, there is a type k mutation for some k < m — j that 
does not fixate but has a type m descendant is asymptotically much smaller than (Nt)(l/N), 
and therefore goes to zero as N — ► oo. Combining this result with (|12p and (|13p gives (|15p . 

We now prove (jl4[) . Choose e > 0. Let 7$ be the time when the mutation at time 7$ 
disappears or fixates. By ([9]), we have Efy — 7$] < Clog N. It follows from Markov's Inequality 
that P(7i — 7j > /^ _1 e) < C log N / (^i" 1 e) . Because the expected number of mutations by time 
[i~ x t is Nt, another application of Markov's Inequality gives 

P{li ~ li> A 4 lg f° r some i such that 7» < u~ l t) < Nt ■ °f — = —(NlogN)^, 

H e e 



which goes to zero as iV — > 00. Therefore, in view of (112[> and (113p . if ^ is the time when the 
mutation at time a m -j fixates, we have 

(16) 
Now (|14p will be immediate from (|15p and (|16p once we show that for all e > 0, 

lim P(/i(r m - C) > e) = 0. (17) 

yv^oo 

When j > 2, equation (|17|) follows from Lemma [8] because after time cr m _j, at most j more 
mutations are needed before we reach time r m . When j = 1, we reach the time r m as soon as there 
is another mutation after time cr m _j, so r m - ( is stochastically dominated by an exponentially 
distributed random variable with rate Nfj,. It follows that (|17p holds in this case as well. □ 

Most of the work involved in proving Theorem [3] is contained in the proof of the following 
result, which is a special case of Lemma 7.1 of |10| . 

Lemma 9. Suppose \i ~ AN~ 2j 1// ( 2J for some j = 2,...,m and some constant A > 0. 
Consider the model of Proposition [IJ Let q'- be the probability that either a type j individual is 
born at some time, or eventually all individuals in the population have type greater than zero. 
Then limjv->oo Nq'j = Xj, where Xj > 1 is given by |3|). 

Proof of Theorem El The proof is similar to the proof of parts 1 and 2 of Theorem [2j Define 
the sequences (7i)^ 1; (£i)?^i> an d (££)£i as i* 1 * ne P ro °f of parts 1 and 2 of Theorem 

[2j Also define a sequence (Ci)^i °f {0> l}-valued random variables such that £i = 1 if the 
mutation at time 7^ either fixates or has a descendant that gets j — 1 additional mutations. Let 
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(Ci)£i be a sequence of i.i.d. random variables, independent of the population process, such that 
p[q = 1) = Xj/N and P((i = 0) = (N - Xj)/N for all i, and Q = 1 whenever £< = 1. Let = d 
if the ith mutation is good, and let C 4 ' = d otherwise. Let <7 = 0. For k = 1, . . . , m — j, let 
cjfe = inf{7i > a k -i : & = 1}- Let <r m _j + i = mf{7i > cr m _j : Ci = !}• Define a^, . . . ,cr' +1 in 
the same way using the random variables ^ and ([. It is clear from the construction that cr' m _j +1 
has the same distribution as S m -j + Y. By the same argument used in the proof of parts 1 and 
2 of Theorem [21 with a bound of 2\j/N replacing the bound of 2/N, we get 

lim P(cr' m _ j+1 = a m _ i+ i) = 1, 

which implies fj>a m -j+i S m -j + Y. This argument also gives that the mutation at time 
cr m _j + i is good with probability tending to one as N — > oo. 
We next claim that 

lim P(r m < a m ^ j+ i) = 0. (18) 

If a m -j < 7i < a m -j + i, then by the definition of a m -j + i, no descendant of the mutation at time 
7i can have a type m descendant. Therefore, if r m < a m -j + i, then before time a m -j there must 
be a type k mutation for some k < m — j that does not fixate but has a type m descendant. 
Because fi ^ N~ 23 /( 2J , the probability of this event goes to zero by the same argument given 
in the proof of parts 1 and 2 of Theorem [21 which implies f)18j) . 
It remains only to prove 

K T m ~ O-m-j+l) 0. (19) 

Let e > 0, and choose t large enough that P(er^ +1 > /i _1 i) < e. Let e > 0. By the same 
argument given in the proof of parts 1 and 2 of Proposition [21 the probability that some mutation 
before time fi~ 1 t takes longer than fi~ 1 e to die out or fixate tends to zero as N — > oo. Therefore, 
if C is the time when the mutation at time o" m _j+i dies out or fixates, then //(£ — cr m _j+i) — > p 0. 
If the mutation at time cr m -j+i fixates, then only j — 1 more mutations are needed before we 
reach time r m . Therefore, conditional on this fixation, when j > 3 we get [i{r m — C) ~^p by 
applying Lemma [8] with j — 1 in place of j, while the result \x{r m — C) ~^p is immediate when 
j = 2. Alternatively, if the mutation at time o" m _j+i does not fixate and the mutation at time 
cr m _ J+ i is good, then r m < Because the mutation at time a m -j + \ is good with probability 
tending to one as n — > oo, we conclude (fT9|h □ 

4 Proof of parts 1 and 2 of Theorem [4] 

The first step in the proof of Theorem[3]is to establish conditions, stated in Proposition [TT1 below, 
under which the number of type k individuals is essentially deterministic, in the sense that it can 
be well approximated by its expectation. It will follow that when \i S> N~ 2 / m , the number of 
individuals with type m — 1 is approximately deterministic until time r m . Since each type m — 1 
individual experiences a type m mutation at rate /i, the approximately deterministic behavior 
of the type m — 1 individuals leads easily to a proof of part 1 of Theorem [H When instead 
Ar -i/(i+(m-i-2)2-o+ 1 )) < ^ ^ Ar -i/(i+(m-j-i)2^) j the number of individuals of type m-j- 1 is 

approximately deterministic up to time r m , as will be shown in Lemma [121 below. The remainder 
of the proof of part 2 of Theorem [4] involves using a Poisson approximation technique to calculate 
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the distribution of the time we have to wait for one of the type m — j — 1 individuals to have a 
type m — j mutation that will give rise to a type m descendant. 

We begin with a lemma bounding the expected number of type k individuals. Recall that 
Xj(t) denotes the number of type j individuals at time t, and Xj(0) = for all j > 1. 

Lemma 10. Let Y k (t) = YlJLk-^jit) be the number of individuals of type k or higher at time t. 
For allk>0 and t > 0, we have E[X k (t)] < E[Y k (t)] < N/j,H k /k\. 

Proof. The first inequality is obvious, so it suffices to show E[Y k (t)] < Nfi t /k\. We proceed 
by induction. Since Yo(t) < N for all t > 0, the result is true for k = 0. Suppose k > 1 and 
E\Yk-i(t)] < N/j, k ~ 1 t k ~ 1 /(k — 1)! for all t > 0. The expected number of type k mutations before 
time t is at most 



H I E[X k ^(s)]ds< 
lo Jo {k - 1)! 



ds 



k\ 



Because individuals of type k and higher give birth and die at the same rate, it follows that 
E[Y k {t)} < Nfi k t k /kl. □ 

Proposition 11. Suppose k > and T is a time that depends on N. Assume that as N — > cxd, 
we have fj,T — > 0, N^T^ 1 — > oo, and N/j 1 k T k — > oo. T/ien /or a// e > 0, 



lim P ( max 

JV^oo \0<t<T 



Xfc(t) 



k+k 



NfiH 



k\ 



0. 



(20) 



Proof. We prove the result by induction and begin with k = 0. Individuals of type one or 
higher are always being born and dying at the same rate. Since new individuals of type one or 
higher also appear because of type 1 mutations, the process (N — Xo(t),t > 0) is a bounded 
submartingale. Let £ = inf{i : N — Xo(t) > eiV}. By the Optional Sampling Theorem, we have 
E[N — Xq(T)\( < T] > eN. Since the rate of type 1 mutations is always bounded by Nfi, we 
have E[N - X (T)] < NfiT. Therefore, 



P[ max \X (t) 
\0<t<T 



N\ > eN ) = P(C <T)< 



E[N - X (T)] NfjT 
E[N-X (T)\( <T) ~ eN 







as N — > oo because [iT — > 0. It follows that when k = 0, (|20p holds for all e > 0. 

Let k > 1. Assume that (I20D holds with k — 1 in place of k. Let B k (t) be the number of 
type k mutations up to time t. Let S k (t) be the number of times, until time t, that a type 
fc individual gives birth minus the number of times that a type k individual dies. Note that 
X k (t) = B k (t) - B k+1 (t) + S k (t), so 



k+k 



x k {t) 



NfiH 



k\ 



< B k+1 (t) + \S k (t)\ + 



k+k 



B k {t) 



Nyrt 



k\ 



Therefore, it suffices to show that with probability tending to one as N 
on the right-hand side of (HQ) stay below eNfj, k T k /3 for t < T. 
By Lemma [101 for < t < T, 



(21) 



oo, the three terms 



E[B k+l {t)] 



ft 



E[X k (t)} dt < -4; — - 

L feWJ - (fe+1)! 
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By Markov's Inequality, 



> z N »* T *) = T' +l(T) > ^'J £ <ffi)i ~ (22) 

as ./V — > oo because //T — > 0. 

Note that <5(0) = 0, and since type A; individuals give birth and die at the same rate, the 
process (S(t),0 < t < T) is a martingale. By Wald's Second Equation, E[S(T) 2 } is the ex- 
pected number of births plus deaths of type k individuals (not counting replacements of a type 
k individual by another type k individual) up to time T, which by Lemma [TOl is at most 

T 2Nn k T k+1 
Therefore, by the L 2 -Maximal Inequality for martingales, 

,kTk+l 



E\m a x\S(t)\ 2 ]<4ElS(T) 2 }<^^-, . 
l o<t<T l (fc + l)! 

Now using Chebyshev's Inequality, 

p( max \S k (t)\ > -N^ k T k ) < 3 \ 2 = 72 ^ (23) 

\o<t<T l * WI 3 f /- (fc + 1)! \eNfi k T k J (k + l)\Nn k T k ~ 1 K ' 

as iV — > oo because N[i k T k ~ l — > oo. 

To bound the third term in (|21|) . note that type fe — 1 individuals mutate to type at rate 
fi. Therefore, there exist inhomogeneous Poisson processes (Ni(t),t > 0) and (N%(t),t > 0) 
whose intensities at time t are given by Nfj, k t k ~ 1 /(k - 1)1 - eN ~[i k T k ~ l /6 and Nfi k t k ' 1 /(k - 1)1 + 
eN/i k T k ~ l /6 respectively such that on the event that 



max 

0<i<T 



X*-i(t) 



^M fe 1 - ' A-.,/,- j -,,/,- I 



(fc-l)! 



< -N^ K - l T k -\ (24) 
6 



we have N\(t) < P&(i) < iV^i) for < i < T. To achieve this coupling, one can begin with 
points at the times of type mutations. To get (iVi(t),t > 0), when there is a type fc mutation 
at time i, remove this point with probability [N n k t k ~ x / (k — 1)! — eN [i k T k ~ l /§}/ nX k _i{t— ). To 
get (N2(t),t > 0), add points of a time- inhomogeneous Poisson process whose rate at time t is 
[Nn k t k ~ l /(k - 1)! + eNn k T k - 1 /&] - fxX k ^{t). 



Note that 



E[J V l(t) ] = / - < N »T~ l ) * = ^ - >/T'- ( (25) 



and likewise 



(Jfe-l)! 6 7 fc! 6 



E[jV 2 (t)] = ^4r~ + -Nii k T k -H. 



k\ 6 

The process (N%(t) — E[Ni(t)],t > 0) is a martingale, and 

E[(Nx(T) - P[iVi(T)]) 2 ] = E[N!(T)] = - € -N^ k T k . (26) 
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Therefore, Chebyshev's Inequality and the L 2 -Maximal Inequality for martingales give 



\0<t<T 



< 



1UE[(Ni(T) - P[Aq(T)]) 2 
(eNfi k T k ) 2 



as N — > oo by (|26|) because Nfj, k T k -> oo. Combining ([25]) with (J2ZD gives 



lim P [ max 



The same argument gives 



lim P ( max 

TV^oo \0<t<T 



Ni(t) 



N 2 (t) 



NfiH k 



k\ 



k+k 



> -Nfi k T k 
3 



> -Nn k T k 
3 



0. 



0. 



(27) 



(28) 



(29) 



as N — > oo. By the induction hypothesis, the event in (124j) occurs with probability tending to one 
as N —* oo, so Ni(t) < Bk(t) < ^(i) for < t < T with probability tending to one as N —* oo. 
Therefore, equations ([28]) and ([29]) imply that 

Nn k t k 



lim P( 



max 

A^oo \0<t<T 



B k (t) 



k\ 



> -N/j, k T k 
3 



0. 



The result follows from dm (1221). (gSD , and (1301). 



(30) 
□ 



Proo/ o/parf i o/ T/ieoremg] Suppose /z > N~ 2 / m , and let T = N'^/jTH. As iV -> oo, we 
have /iT = N~V m t -> 0, A//™- 1 ! 1 ™- 2 = N 2 / m ^t m - 2 -» oo, and A/Z^P" 1 - 1 = N 1 /™^ 1 -> 
oo. Therefore, by Proposition 111] if e > 0, then with probability tending to one as A — ► oo, 



max 

0<s<T 



A m _i(s) 



Aii 



m-l^m-l 



< eNti n - 1 T m - 1 . 



(31) 



(m- 1)! 

Because each type m— 1 individual experiences a type m mutation at rate /i, the random variable 



/xX m _i(s) ds 



has an exponential distribution with mean one. When (|31|) holds, we have 

AT mrnm [T NlJ m T m 

-eN{i m T m < fiXm-^a) ds < -1=—. — + eNfi m T m . 



ml 



ml 



It follows that 



Nn m T m 

lim sup P(r m > T) < lim sup P( V > — eNfi m T m 



ml 



f f" 1 
P[W > — -ef 
\ ml 



exp 



- 1 + ef 
ml 



and likewise 



Nu m T m 

liminf P(T m >T)> liminf P( V > - J — l + eA// m T r ' 

N-^oo " N^oo V ml 



exp 



ml 



Because these bounds hold for all e > 0, the result follows. 



□ 
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We now work towards proving part 2 of Theorem 01 For the rest of this section, we assume 
that 

Ar -l/(l+(m-j-2)2-(^+ 1 )) < < N -l/(l+(™-j-l)2-j) ^ 

for some j = 1, ... ,m — 2. This condition implies that Nfj, — > oo and /i — > as iV — > oo, and 
therefore 

Nfi 1 - 2 ' 1 -> oo. (33) 
Also, for the rest of this section, i is fixed and 

This means that 

N[i m ~3T m -i = / i-( 1 - 2 " i )t m -J. (35) 
Let e > 0. Let Gat be the event that 



max 

0<s<T 



-^m-7-l( s ) 



(m- j - 1)! 



< eNiM™-'- 1 !™-*- 1 . 



The next lemma shows that Gm occurs with high probability, indicating that on the time scale 
of interest, the number of individuals with m — j — 1 mutations stays close to its expectation. 

Lemma 12. We have liniTv^oo P(Gn) = 1- 

Proof. We need to verify the conditions of Proposition [TT] with m — j — 1 in place of k. By (|33f) . 

as iV — ► oo, 

flT = iY-l/(^-i) M -(l-2^')/(m-j) t = ( Nll l-2-iyl/(m-j) t ^ Q ( 3 g) 

Also, using the first inequality in (f32|) . 

]Vu m ~ J ~ 1 T m_J-2 = N^~( m ~i~ 2 )H m ~3) „m-i-l-(m-j-2)-(m-i-2)(l-2--j)/(m-j)^m-j-2 
= N 2/(m-j) 2/(m-j)+(m-j-2)2-* /(m-j) t m-j-2 

= (A y +(™-i-2)2-" +1 > ) 2/(m-i) t m-i-2 ^ ^ (37) 

Using the second inequality in (|32p and the fact that m — j + 1 — 2 _J > 1 + (m — j — l)2~- ? , 
T = (iV r /i "»-i+ 1 - 2_, ')- 1 /(m-i) i > (jvl-(m-i+l-2-i)/(l+(fi»-i-l)2-0)-l/(»n-j) i _> 

This result and (i37l) imply Nfi m ~ 3 ~ 1 T m ~ 3 — > oo, which, in combination with (f36l) and (|37l) . 
gives the lemma. □ 

The rest of the proof of part 2 of Theorem 0] is similar to the proof of Theorem 2 in [10] . It 
depends on the following result on Poisson approximation, which is part of Theorem 1 of [1] and 
was used also in 10 1. 
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Lemma 13. Suppose (j4j)j g j is a collection of events, where X is any index set. Let W = 
J2iei ^-Ai be the number of events that occur, and let A = E[W] = Yli^x P{Ai)- Suppose for each 

1 G X, we have i £ /?, C X. Let Ti = ^((^.jOjezVft)- Define 

b 2 = ^ E ^(AnAj), 

iGX 

T/ien |P(W = 0) - e~ A | < 61 + b 2 + 63. 

We will use the next lemma to get the second moment estimate needed to bound b 2 . When 
we apply this result, the individuals born at times t± and t 2 will both have the same type. We 
use different types in the statement of the lemma to make it easier to distinguish the descendants 
of the two individuals. This result is Lemma 5.2 of |10j . 

Lemma 14. Fix times t\ < t 2 . Consider a population of size N which evolves according to the 
Moran model in which all individuals initially have type 0. There are no mutations, except that 
one individual becomes type 1 at time t\, and one type individual (if there is one) becomes type 

2 at time t 2 . Fix a positive integer L < N/2. For i = 1,2, let Yi(t) be the number of type i 
individuals at time t and let Bi be the event that L < maxt>o Yi(t) < N/2. Then 

P(B 1 nB 2 ) < 2/L 2 . 

Lemma 15. Consider the model introduced in Proposition^ Assume A^/i 1-2 3 — > 00 as N — > 00. 
We define the following three events: 

1. Let Ri be the event that eventually a type j + 1 individual is born. 

2. Let R 2 be the event that the maximum number of individuals of nonzero type at any time 
is between e/x~ 1+2 J and N/2. 

3. Let i?3 be the event that all individuals still alive at time e -1 /i~ 1+2 3 have type zero. 

Let Qj+i = P(R\ H R 2 fl -R3). Then there exists a constant C , not depending on e, such that 
Qj+i ~ Cefx 1 - 2 ^ 3 < q j+ i < q j+1 . 

Proof. Because qj + \ = P(R\), the inequality qj+\ < qj+i is immediate. We need to show that 
P{Ri n (U2j U R%)) < Ce/i 1 " 2 "- 7 . Because eT 1 ^^ 3 < N for sufficiently large N, we have 
P(Rc) < Ce/i 1 " 2 " 3 by JTD]). It remains to show that P{R X n R c 2 ) < Ce^~ 2 ~ 3 . 

The probability that the number of individuals of nonzero type ever exceeds N ~/2 is at most 
2/N <C e/i 1-2 1 . By (jHJ) and the fact that each type 1 individual experiences type 2 mutations 
at rate (jl, the expected number of type 2 mutations while there are k individuals of nonzero type 
is at most (kfi)(l/k) = /i. Therefore, the expected number of type 2 mutations while there are 
fewer than e/j~ l+2 3 individuals of nonzero type is at most e/j, 2 3 . The probability that a given 
type 2 mutation has a type j + 1 descendant is at most C/U 1 " 2 <J 11 by Proposition [TJ It now 
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follows, using Markov's Inequality, that the probability that some type 2 mutation that occurs 
while there are fewer than e/i~ 1+2 3 individuals of nonzero type has a type j + 1 descendant is 
at most Ce^ 3+1 - 2 ' (3 ' 1] = Ce^~ 2 ~ j . Thus, P(R 1 n R c 2 ) < Ce/x 1 " 2 ^. The result follows. □ 



We now define the events to which we will apply Lemma [I3j Divide the interval [0, T] into M 
subintervals of equal length called I\, ■ ■ ■ , Im, where M will tend to infinity with N. Because 
type m — j — 1 individuals experience type m — j mutations at rate fx, we can construct an 
inhomogeneous Poisson process K on [0, T] whose intensity at time s is given by 

AT u m-j m-j-l 

-f — + eNfi^T" 1 ^- 1 (38) 

(m-j-l)! 

such that on the event Gn, all the times of the type m — j mutations before time T are points 
of K. Let Di be the event that there is a point of K in the interval Ii. Let £i,£2> • • ■ 5 £m be 
i.i.d. {0, l}-valued random variables, independent of K and the population process, such that 
= 1) = <jj+i for all i, where q~j+i comes from Lemma [Pol Let Ai be the event that Di occurs, 
and one of the following occurs: 

• The first point of K in Ii is the time of a type m — j mutation, and the three events defined 
in Lemma [Pol hold. That is, the type m — j mutation eventually has a type m descendant, 
the maximum number of descendants that it has in the population at any future time is 
between e/i~ 1+2 3 and N/2, and it has no descendants remaining a time e _1 /i _1+2 3 after 
the mutation occurs. 

• There is no mutation at the time of the first point of K in /j, and £j = 1. 
Let W = YliLi 1a» be the number of the events Ai that occur, and let A = i£[W]. 
Lemma 16. We have limsup^v^^ |P(W = 0) — e _A | = 0. 

Proof. Let 0i be the set of all j < M such that the distance between the intervals Ii and Ij is at 
most e _1 /i _1+2 3 . Define 61, 62 , and 63 as in Lemma [P3"l We need to show that b\, 62, and 63 all 
tend to zero as ./V — > 00. 

It is clear from properties of Poisson processes that the events D±, . . . , Dm are independent, 
and it is clear from the construction that P(AADi) = q~j+\ for all i. The events A\, . . . ,Am are 
not independent because mutations in two intervals Ih and Ii may have descendants alive at the 
same time. However, if Ii = [a, b], then the third event in Lemma [Pol guarantees that whether or 
not Ai has occurred is determined by time b + e _1 /x _1+2 3 , and therefore Ai is independent of all 
Ah with h £ fa. It follows that 63 = 0. 

The length |Jj| of the interval Ii is T/M. In view of (j38H . 

P(D l ) < CN^T^-^Ii] = CNfi m - j T m - j /M. (39) 

Because ([33]) holds, we can apply Proposition [Pj to get qj+i < qj+i < Cfi 1 ^ 2 3 . Therefore, using 
also (f35|) . 

CN m-j+l-2-irpm-j C 
P(A) = P(D t )q J+1 < — < - 
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for all i. There are at most 2(1 + e~V~ 1+2 3 /\h\) < Ce"V~ 1+2 ' M/T indices in It follows 
that 



61 < M 



< Ce-^-^T- 1 

< Ce - V -1+2 ^-^ 1/(m ~' 7 V 1+(1_2 ^ )/(m_i) 

= C7 e - 1 (iV Ai 1 + 2 " J ( m -J- 1 )) 1 /( m -i) ^ o (4 ) 



as N — > 00, using the second inequality in ([32]) . 

To bound 62, suppose h ^ i. Suppose Dh and Di both occur. If the first points of the Poisson 
process in Ih and Ij are times of type m — j mutations, then for Ah n Ai to occur, the event 
B\ n B2 in Lemma [T4l must occur with L = e/i~ 1+2 3 . It follows that 

P{A h n 4|£> h n Di) < max{2/( e/ i- 1+2 " J ) 2 , g 2 +1 } < Ce^ . 

Therefore, using ([39]), (pE}, and the fact that P(D h n A) = P(D h )P(Di) by independence, 

2 



Thus, by reasoning as in ([IP]) , we get 



as TV —> 00, which completes the proof. □ 

Lemma 17. Lei a m be the time of the first type m—j mutation that will have a type m descendant. 
Then 

( t m ~i 

lim P{a m > T) = exp 



N^oo \ [m — jy.J 

Proof. We claim there is a constant C, not depending on e, such that for sufficiently large N, 

t m-j 



X 



[m-j)\ 



< Ce, (41) 



where A comes from Lemma [TBI and 



\P(W = 0) - P(a m > T)\ < Ce. (42) 

The result follows from this claim by letting e — > and applying Lemma [TBI 

Recall that we have divided the interval [0, T] into the subintervals I\, . . . ,Im- By letting 
M tend to infinity sufficiently rapidly as N tends to infinity, we can ensure that the expected 
number of points of the Poisson process K that are in the same subinterval as some other point 
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tends to zero as N — > oo. Therefore, X^i is asymptotically equivalent to the expected 

number of points of K. That is, 



i=l 

Now 



VP(A)~ / -/ rr- + eN^-iT™-!' 1 ds = — % ^— + eNix m ~iT m -i. (43) 

^ Jo (m-j-1)! [m-j)\ 



M M 



so using Proposition [Q the second inequality in Lemma [151 (03) , arid ([35]) , 

(Af i l rn ~j r P m ~j \ f m ~j 
—9 — + eNfi m ~ j T m ~ j = + t m - j e. (44) 

{m-j)\ ' J {m-j)\ 

Likewise, dropping the second term and using the first inequality in Lemma [T5l we get 

hmmf A > hmmf (1 - Ce)a 2 — V ^ — = — rr^- (45) 

Af^oo Af^oo \ (m-j)l J (m-j)l 

Equations (03J and (05]) imply (01]) . 

It remains to prove (02]). The only way to have W > and cr m > T is if for some i, there 
is a point of K in Jj that is not the time of a type m — j mutation and = 1. On Gat, points 
of -RT that are not mutation times occur at rate at most 2eN[i m ~ J T m ~ :i ~ . Because the Poisson 
process runs for time T and P(£j = 1) = Qj+i < C/x 1-2 J by Lemma [T5l and Proposition [T] we 
have, using ([35]) . 



P(W > and CT m > T) < P(G C N ) + CeNfi m ~^ +1 - 2 J T m ~i < P(G C N ) + Ce. (46) 

We can have W = with cr m < T in two ways. One possibility is that two points of K occur 
in the same subinterval, an event whose probability goes to zero if M goes to infinity sufficiently 
rapidly with N. The other possibility is that some type m — j mutation before time T could 
have a type m descendant but fail to satisfy one of the other two conditions of Lemma [15] The 
probability of this event is at most 

P(G%) + CNtJ, m - j T m -i(q j+1 - q j+1 ) < P(G C N ) + CeNir-i +l - 2 ~ 3 T m -i < P(G%) + Ce (47) 

by Lemma H5] and (|3"5]) . Equation (02]) follows from (06]), (07]), and Lemma [T2l □ 

Proof of part 2 of Theorem ^\ Recall the definition of T from (I34p . Define a m to be the time of 
the first type m — j mutation that will have a type m descendant. Then a m < r m , and by Lemma 
[T71 it suffices to show that 

lim P(a m < T and r m - a m > ^-V^-j^-Mi-a^ = Q ^ 

N^oo 

for all S > 0. The event in (|48|) can only occur if some type m — j mutation before time 
T either fixates or takes longer than time 5N~ 1 ^ m ~^ fi^ 1 ^^ 2 3 >/\ m ~i) to disappear from the 
population. By Lemma [TUl before time T the expected rate of type m — j mutations is at 
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most CNfi m ~ :1 T m ~^ , so the expected number of type m — j mutations by time T is at most 
CNfi m ~ :) T m ~ J . Because the probability that a mutation fixates is 1/N, the probability that some 
type m — j mutation before time T fixates is at most C/j, rn ~ :, T rn ~ J , which goes to zero as N — > oo 
because fiT — » by ([36]) . 

Next, note that <$jV~ 1 /( m -i) ) ii- 1 -( 1 - 2 3 )/( m_ ^') -c AT, which can be seen by dividing both sides 
by N and observing that 5(Nii)~ 1 (Nfj, 1 ^ 2 3 )-V( m -i) — > because AT/i — > oo and Ny}~ 2 3 — > oo. 
Therefore, for sufficiently large A 7 ', we can apply fjXOj) to show that the probability that a given 
mutation lasts longer than time 5N~^K m ~i> pT 1- (l— 2 J )/( m -j) before disappearing or fixating is 
at most C5~ 1 A 7l /( m ^ J ) / u 1+ ^ 1 ~ 2 , )/( m ~^). Thus, the probability that some mutation before time 
T lasts this long is at most 

= C8- 1 (N^ m - j - 1 ^) 1 ^ m - j h m - : > -> 
by the second inequality in (f32l) . and (|4"8"1) follows. □ 

5 Proof of part 3 of Theorem [4] 

Throughout this section, we assume 

H ~ A7V-V(i+(— (49) 
for some j = 1, . . . , m — 1, as in part 3 of Theorem HI Also, let T = ^~( 1-2 3 H. Then 

lim Nn m ->T m - i n 1 - 2r ~ i = lim N -1)2^ t m-j = A l+( m -j-l)2^ t m-j _ ^ 

We first show that the number of individuals of type m — j — 1 is approximately deterministic 
through time T. 



Lemma 18. Let e > 0. Let Gn(£) be the event that 

N ^m-j-l s rn-j-l 



max 

0<s<T 



X m -j-i(s) 



(m-j- 1)1 
Then liniAr^oo P(Gjv(e)) = 1. 



Proof. As in the proof of Lemma [T2l we need to check the conditions of Proposition [IT] with 
m — j — 1 in place of k. Because fi — > as N — > oo, we have 

/iT = /i 2 ~'i -» (51) 
as A 7 ^ oo. Also, using that /i ~ AW-VCM-fa-i-i)* - ') > jv-l/a+Cm-j-^-^ we have 

N n rn -i- l T m ~i~ 2 = Ar/i m ~ J ~V~^~ 2 ^ m ~ 3 ~ 2 " > * m ~ J ~ 2 = N ii 1 ^^^ 2 ^ 2 ' 3 t m ~^~ 2 — > oo 

as A 7 — > oo. Since T — > oo as A 7 — > oo, we also have A 7 ^ m ~ J, ~ 1 T ,?1_J_1 — ► oo as A 7 — ► oo, and the 
lemma follows. □ 
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Although the number of type m — j — 1 individuals is approximately deterministic, there are 
stochastic effects both from the number of type m — j individuals in the population and from 
the time that elapses between the appearance of the type m — j mutation that will have a type 
m descendant and the birth of the type m descendant. Further complicating the proof is that 
because births and deaths occur at the same time in the Moran model, the fates of two type m — j 
mutations that occur at different times are not independent, nor is the number of type m — j 
individuals in the population independent of whether or not the type m — j + 1 mutations succeed 
in producing a type m descendant. Our proof is very similar to the proof of Proposition 4.1 in 
|10j and involves a comparison between the Moran model and a two-type branching process. To 
carry out this comparison, we introduce five models. 

Model 1: This will be the original model described in the introduction. 

Model 2: This model is the same as Model 1 except that there are no type 1 mutations and 
no individuals of types 1, . . . , m — j — 1. Instead, at times of an inhomogeneous Poisson process 
whose rate at time s is N pL m ~ 3 s m ~ 3 / (m— j — 1)!, a type zero individual (if there is one) becomes 
type m — j. 

Model 3: This model is the same as Model 2, except that type m—j + 1 mutations are suppressed 
when there is another individual of type m — j + 1 or higher already in the population. 

Model 4: This model is the same as Model 3, except that two changes are made so that the 
evolution of type m — j + 1 individuals and their offspring is decoupled from the evolution of the 
type m — j individuals: 

• Whenever there would be a transition that involves exchanging a type m — j individual 
with an individual of type k > m — j + 1, we instead exchange a randomly chosen type 
individual with a type k individual. 

• At the times of type m — j + 1 mutations, a randomly chosen type individual, rather than 
a type m — j individual, becomes type m — j + 1. 

Model 5: This model is a two-type branching process with immigration. Type m — j in- 
dividuals immigrate at times of an inhomogeneous Poisson process whose rate at time s is 
N fi m ~ : ?s m_J_1 / (m — j — 1)!. Each individual gives birth at rate 1 and dies at rate 1, and 
type m — j individuals become type m at rate (iqj, where qj comes from Proposition [TJ 

For i = 1,2, 3, 4, 5, let Yi(s) be the number of type m — j individuals in Model i at time s, 
and let Zi(s) be the number of individuals in Model i at time s of type m — j + 1 or higher. 
Let ri(s) be the probability that through time s, there has never been a type m individual in 
Model i. Note that r\{T) = P{j m > T), so to prove part 3 of Theorem [H we need to calculate 
limAr^oo ri(T). We will first find lmi/v^oo r$(T) and then bound \ri(T) — ri + i(T)\ for i = 1,2, 3, 4. 

5.1 A two-type branching process with immigration 

Here we consider Model 5. Our analysis is based on the following lemma concerning two- type 
branching processes, which is proved in section 2 of [10]; see equation (2.4). 
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Lemma 19. Consider a continuous-time two-type branching process started with a single type 1 
individual. Each type 1 individual gives birth and dies at rate one, and mutates to type 2 at rate 
r. Let f(t) be the probability that a type 2 individual is born by time t.Ifr and t depend on N 
with r — > and r l / 2 t —* s as N —* oo, then 



lim r~ 1/2 f(t) 



1 - e~ 2s 



N^oo w 1 + e~ 2s 

Lemma 20. We have 

/ 4l+(m-j-l)2- J ft . i _ -2s \ 

lim rs(T) = exp - — — - / (t - s )—^_-^ ds . (52) 

Af^oo V {m-j-l)l Jo 1 + e 23 J 

Proof. Let g(w) be the probability that in Model 5, a type m — j individual that immigrates at 
time w has a type m descendant by time T. Because type m — j individuals immigrate at times 
of an inhomogeneous Poisson process whose rate at time w is N pJ a ~' J 'w m ~i~ l j \m — j — 1)!, we 
have 

r 5 (T) = exp ( - {m _ 1 j _ 1)l I Nr~ j w rn - j - l g(w) dw] . (53) 



o 



Making the substitution s = /i 1 2 J w, we get 



T t 

Ny m - j w m - j - l g{w)dw= [ Nfx l+ ^ m - j - 1 ^ j s m - j - 1 g(^ 1 - 2 ^s)^ ( - 1 - 2 ~ j Us. (54) 



Jo 
As N -> oo, we have N ^+(m-j-l)2-' A l+{m-j-i)2-i by gg^ Note algo that ^_(i_ 2 -i) a ) = 
/(/x^ 1-2 J )(t — s)) ; where / is the function in Lemma [191 when r = /xgj. Also, by Proposition 
IU m ~ • /i 1 - 2 " " 1 ' = (/J 1 " 2 "') 2 , so r-V2 „ ^,-(1-2-0 an d r i/2 /i -(l-2"0 ( t _ s ) _ j _ s as 
N ^ oo. Therefore, by Lemma [T9l 

Using also (|54p and the Dominated Convergence Theorem, 

lim / Nii m - j w m -i- x g(w) dw = j^+^-i-^ 3 / fl "»-i-i e — - rf g . (55) 

N^ooJ J 1 + e 2 ^- s ' 

The result follows from (|53p and (|55h after interchanging the roles of s and t — s. □ 

5.2 Bounding the number of individuals of type m — j and higher 

We begin with the following lemma, which bounds in all models the expected number of individ- 
uals in the models having type m — j or higher. 

Lemma 21. For i = 1,2, 3, 4, 5, we have 

max ElYAs) + ZAs)] < CNfi m - j T m - j . (56) 

0<s<T 

Also, for all five models, the expected number of type m — j + 1 mutations by time T is at most 
CNu m ~ : ' +l T m ~^ +1 . 
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Proof. Because each type m — j individual experiences type m — j + 1 mutations at rate /i, the 
second statement of the lemma follows easily from the fact that i?[Yi(s)] < CN ' ^ m ~JT m ~ :> , which 
is a consequence of (|56l) . 

To prove (|56p . first note that because births and deaths occur at the same rate, in all five 
models E[Yi(s) + Zi(s)] is the expected number of individuals of types m — j and higher that 
appear up to time s as a result of mutations, or immigration in the case of Model 5. For i = 2, 3, 5, 
these mutation or immigration events occur at times of a rate A f // m_ - 7 s m_ -'~ 1 /(m— j — 1)1 Poisson 
process (unless they are suppressed in Model 2 or 3 because no type zero individuals remain), so 
(|56p holds. In Model 1, the mutation rate depends on the number of type m — j — 1 individuals, 
but ([56]) holds by Lemma [TU1 

Model 4 is different because type rather than type m — j individuals are replaced at the 
times of type m — j + 1 mutations. The above argument still gives E\Y^(s)] < CN n m ~ :i T m ~ :> for 
s < T because type m — j individuals give birth and die at the same rate. Thus, the expected 
number of type m — j + 1 mutations by time T is at most C 'N p j rn -i+ lr p m -j+ 1 . it follows that 
E[Z 4 {s)} < CNii m -i +l T m -i +l < Nfi m ^T m ^ for s < T, using the fact that fiT -> as N -> oo 
by ([EE]). Therefore, holds for Model 4 as well. □ 

Lemma I2T1 easily implies the following bound on the maximum number of individuals of type 
m—j or higher through time T. The lemma below with f(N) = 1/N implies that with probability 
tending to one as N — > oo, the number of individuals of type m — j or higher does not reach N 
before time T. 

Lemma 22. Suppose f is a function of N such that N^ m -^ 2 3 f{N) -» as N -» oo. Then 
for i = 1,2, 3, 4, 5, as N — > oo we have 



Proof. Because individuals of type m — j or higher give birth and die at the same rate, and they 
can appear but not disappear as a result of mutations, the process (Yi(s) + Zj(s),0 < s < T) is 
a nonnegative submartingale for i = 1,2, 3, 4, 5. By Doob's Maximal Inequality, for all 5 > 0, 



Since Nfi m ^T m -i = N^ m ^ 2 3 t m ~ j , equation ^ implies that if N^ m ~^ 2 3 f(N) ^ as 
N — > oo, then the right-hand side of (|58|) goes to zero as N — > oo for all 5 > 0, which proves 



5.3 Comparing Models 1 and 2 

In this subsection, we establish the following result which controls the difference between Model 1 
and Model 2. The advantage to working with Model 2 rather than Model 1 is that the randomness 
in the rate of the type m — j mutations is eliminated. 



max (Yi(s) + Zi(s))f(N) ^ p 0. 



(57) 




(58) 



(ISZD. 



□ 



Lemma 23. We have limjv. 



ri (T) - r 2 (T)\ = 0. 
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Proof. Lemma [22] with f(N) = 1/JV implies that with probability tending to one as N — > oo, up 
to time T there is always at least one type individual in Model 2, so hereafter we will make this 
assumption. In this case, a type m — j individual replaces a randomly chosen type individual in 
Model 2 at times of a Poisson process K whose rate at time s is N {j, m ~i s m ~ :> j (m — j — 1)1. We 
will first compare Model 2 to another model called Model 2', which will be the same as Model 2 
except that type m — j individuals arrive at times of a Poisson process K' whose rate at time s 
is m&x{0,Nn m -is m - j - 1 /(m- j - 1)1 - eNn^T™-*- 1 }, where e > is fixed. 

Models 2 and 2' can be coupled so that births and deaths occur at the same times in both 
models, and each point of K' is also a point of K. Consequently, a coupling can be achieved 
so that if an individual has type k > m — j in Model 2', then it also has type k in Model 2. 
With such a coupling, the only individuals whose types are different in the two models are those 
descended from individuals that in Model 2 became type m — j at a time that is in K but not 
K' . The rate of points in K but not K' is bounded by eiV ^ m ~iT m ~^~ l . The probability that 
a given type m — j individual has a type m descendant is at most C/x 1-2 3 by Proposition [TJ 
Therefore, the probability that there is a type m individual in Model 2 but not Model 2' before 
time T is bounded by 

eNn m ~>T mr -3 ■ Cfi 1 ' 2 ' 1 < Ce, (59) 

using (|50j) . Therefore, letting T2'(T) denote the probability that there is no type m individual in 
Model 2' by time T, 

\r 2 (T)-r 2 >{T)\<Ce. (60) 

We now compare Model 1 and Model 2'. These models can be coupled so that births and 
deaths in the two models happen at the same times and, on Gjv(e), there is a type m—j mutation 
in Model 1 at all of the times in K' . This coupling can therefore achieve the property that on 
Gjv(e) ; any individual of type k > m — j in Model 2' also has type k in Model 1. The only 
individuals in Model 1 of type k > m — j that do not have the same type in Model 2' are those 
descended from individuals that became type m — j at a time that is not in K' . On Gjv(e), the 
rate of type m — j mutations at times not in K' is bounded by 2eNfj/ n ~- > T m ~ :> . Therefore, by 
the same calculation made in ([59]) . the probability that Gjv(e) occurs and that Model 1 but not 
Model 2' has a type m descendant by time T is at most Ce. This bound and Lemma [18] give 

\n(T)-r 2 ,{T)\<Ce. (61) 

The result follows from (|60f) and (|6ip after letting e — > 0. □ 

5.4 Comparing Models 2 and 3 

In this subsection, we establish the following lemma. 

Lemma 24. We have lim^oo \r 2 (T) - r 3 (T)| = 0. 

The advantage to working with Model 3 rather than Model 2 is that in Model 3, descendants 
of only one type m — j + 1 mutation can be present in the population at a time. As a result, each 
type m — j + 1 mutation independently has probability qj of producing a type m descendant. 
With Model 2, there could be dependence between the outcomes of different type m — j + 1 
mutations whose descendants overlap in time. 
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The only difference between Model 2 and Model 3 is that some type m — j + 1 mutations 
are suppressed in Model 3. Therefore, it is easy to couple Model 2 and Model 3 so that until 
there are no type individuals remaining in Model 2, the type of the ith individual in Model 2 is 
always at least as large as the type of the ith individual in Model 3, with the only discrepancies 
involving individuals descended from a type m — j + 1 mutation that was suppressed in Model 3. 
Because Lemma [22] with f{N) = 1/N implies that the probability that all type zero individuals 
disappear by time T goes to zero as N —* oo, Lemma l24l follows from the following result. 

Lemma 25. In Model 2, the probability that some type m — j + 1 mutation that occurs while 
there is another individual of type m — j + 1 or higher in the population has a type m descendant 
tends to zero as N —* oo. 

Proof. By Lemma [2T| the expected number of type m — j + 1 mutations by time T is at most 
CiV/i 771- 3+ lyro— i+ l gy expected amount of time, before time T, that there is an indi- 

vidual in the population of type m — j + 1 or higher is at most CN i/ n ~^ +1 T m ~^ +l {\.ogN). 

By LemmaESwith f(N) = l/(Nfj, m ~ j T m ' j log AT), the probability that the number of type 
m — j individuals stays below N /j, m ~^T m ~ :i log N until time T tends to one as N —* oo. On 
this event, the expected number of type m — j + 1 mutations by time T while there is another 
individual in the population of type m — j + 1 or higher is at most 

h N = {CNfi m ^ +l T m - j+l \ogN){N^ m - ] T m -nogN)^ 

The probability that a given such mutation produces a type m descendant is qj < C[i l ~ 2 (j J) by 
Proposition [H so the probability that at least one such mutation produces a type m descendant 
is at most 

h Nqj < C[nT '(log Nf^N^-iT™-^ 1 - 2 ^] 2 . 

Because ^T(logiV) 2 = /U 2 ~ J (log iV) 2 -> as N -» oo and N n m ' j T m '^ n 1 ' 2 ^ stays bounded as 
N — > oo by (|50p . the lemma follows. □ 

5.5 Comparing Models 3 and 4 

In both Model 3 and Model 4, each type m — j + 1 mutation independently has probability qj of 
producing a type m descendant. The advantage to Model 4 is that whether or not a given type 
m — j + 1 mutation produces a type m descendant is decoupled from the evolution of the number 
of type m — j individuals. 

We first define a more precise coupling between Model 3 and Model 4. We will assume 
throughout the construction that there are fewer than N/2 individuals in each model with type 
m — j or higher. Eventually this assumption will fail, but by Lemma [22l the assumption is valid 
through time T with probability tending to one as N — > oo, which is sufficient for our purposes. 

For both models, the ./V individuals will be assigned labels 1, . . . , N in addition to their types. 
Let L be a Poisson process of rate N on [0, oo), and let I\, I2, . . . and J%, J2, ■ ■ ■ be independent 
random variables, uniformly distributed on {1, ...,JV}. Let K be an inhomogeneous Poisson 
process on [0, 00) whose rate at time s is N \i m ~ 3 s m ~^ 1 / '{m — j — 1)!, and let L\,...,L^ be 
independent rate /x Poisson processes on [0, 00). In both models, if s is a point of K, then at 
time s we choose an individual at random from those that have type in both models to become 
type m — j. Birth and death events occur at the times of L. At the time of the mth point of 
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L, in both models we change the type of the individual labeled I m to the type of the individual 
labeled J m . In Model 4, if I m has type m — j and J m has type k>m — j + l, then we choose a 
type individual to become type m — j to keep the number of type m — j individuals constant. 
Likewise, in Model 4, if I m has type k > m — j + 1 and J m has type m — j, then we choose a 
type m — j individual to become type 0. In both models, the individual labeled i experiences 
mutations at times of Lj, with the exceptions that type individuals never get mutations and 
mutations of type m — j individuals are suppressed when there is already an individual of type 
m — j + 1 or higher in the population. Also, in Model 4, if s is a point of Lj and the individual 
labeled i has type m — j at time s— , then in addition to changing the type of the individual 
labeled i, we choose a type individual to become type m — j so that the number of type m — j 
individuals stays constant. 

Note that by relabeling the individuals, if necessary, after each transition, we can ensure that 
for all s > 0, at time s there are min{l3(s), Y^s)} integers i such that the individual labeled 
i has type m — j in both models. The rearranging can be done so that no individual has type 
m — j in one of the models and type m — j + 1 or higher in the other. Also, with this coupling, if 
a type m — j + 1 mutation occurs at the same time in both models, descendants of this mutation 
will have the same type in both models. In particular, if the mutation has a type m descendant 
in one model, it will have a type m descendant in the other. 

Let W(s) = Y^(s) — Y^(s), which is the difference between the number of type m—j individuals 
in Model 3 and the number of type m — j individuals in Model 4. There are three types of events 
that can cause the process (W(s),0 < s < T) to jump: 

• When a type m—j individual experiences a mutation in Model 3 and becomes type m—j+1, 
there is no change to the number of type m — j individuals in Model 4. At time s, such 
changes occur at rate either or ^Ys(s), depending on whether or not there is already an 
individual in Model 3 of type m — j + 1 or higher. 

• When one of the individuals that is type m — j in one process but not the other experiences 
a birth or death, the W process can increase or decrease by one. If Y^(s) > Y^(s), then at 
time s, both increases and decreases are happening at rate |W(s)|(iV — \ W(s)\)/N because 
the W process changes unless the other individual involved in the exchange also has type 
m — j in Model 3 but not Model 4. If Y^{s) > Y^(s), then increases and decreases are each 
happening at rate \ W {s)\{N —\W {s)\ — Z±(s)) / N because in Model 4, transitions exchanging 
a type m — j individual with an individual of type m — j + 1 or higher are not permitted. 

• The number of type m — j individuals changes in Model 3 but not Model 4 when there is 
an exchange involving one of the individuals that has type m — j in both models and one 
of the individuals that has type m — j + 1 or higher in Model 4. Changes in each direction 
happen at rate Z±(s) min{l3(s), Y±(s)}/N. 

Therefore, the process (W(s), < s < T) at time s is increasing by one at rate A(s) and decreasing 
by one at rate A(s) + 7(5), where 



< 7(a) < fjY 3 {s) 



(62) 



and 




z i{s)l{Y 4 (s)>Y :i (s)}) Z 4 (s)mm{Y 3 (s),Y 4 (s)} 



N 



(63) 
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Lemma 26. For < s < t, let 



Then as N —* oo, 



max I Wjv(s) I ^ p 0. (64) 

0<s<t 



Proof. The proof is similar to the proof of Lemma 4.6 in [10J. We use Theorem 4.1 in chapter 
7 of [H] to show that the processes (Wn(s),0 < s < t) converge as N — ► oo to a diffusion 
(X(s),0 < s < t) which satisfies the stochastic differential equation 

dX(s) = b(X(s)) + a(X(s)) dB(s) (65) 

with b(x) = and a(x) = 2A~ 1 ~( m ~ J ~ 1 ) 2 3 \x\ for all x, where A is the constant from (|49[) . The 
Yamada-Watanabe Theorem (see, for example, (3.3) on p. 193 of [7j) gives pathwise uniqueness 
for this SDE, which implies that the associated martingale problem is well-posed. 
For all N and all s G [0, t], define 

Bn{s) = - N^~^ L 7(? ^ )) dr = - N^-i-i)2-> l 7(rM- (1 - 3_i) ) * 

and 

At time s, the process (Wat(s),0 < s < i) experiences positive jumps by 1/(N ^ m ~^ 2 J ) at 
rate A(s^~( 1-2 *))/x - ( 1-2 J ) and negative jumps by the same amount at the slightly larger rate 
(A(s/x-( 1 - 2_J )) + 7(s/j-( 1 - 2 ~ J )))/i-( 1 - 2 A Therefore, letting M N (s) = W N (s) - B N (s), the pro- 
cesses (Mj\r(s),0 < s < t) and (Mjy-(s) — Atv(s),0 < s < t) are martingales. We claim that as 
N -> oo, 

sup |Bjv(s)| — ^ (66) 

0<s<t 

and 



sup 

0<s<t 



Ajv(s) - 2^- 1 -( m -- J '- 1 ^' / \W N (r)\dr 



v 



0. (67) 



The results (|66p and (|67p about the infinitesimal mean and variance respectively enable us to 
deduce from Theorem 4.1 in chapter 7 of pj~] that as N — > oo, the processes (Wn(s), < s < T) 
converge in the Skorohod topology to a process (X(s),0 < s < T) satisfying ([65]) . Because 
W N (0) = for all N, we have X(0) = 0, and therefore X(s) = for < s < T. The result (jMj) 
follows. 

To complete the proof, we need to establish (|66p and (|67p . Equation (|62p and Lemma [22] with 
/(AT) = t/iNn^-j-^ 12 ' 1 ) imply that as iV ^ oo, 

sup |5jv(s)| < t — - ttt ; max Ysfs) ->„ 0, 

which proves ([66]) . 
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To prove (|67j) . note that 

A N {s) - 2A~ 1 -( m - j - 1 V 



\W N (r)\dr 



2A(r//~( 1 - 2 J )) +7(r/i-( 1 - 2-3 )) 2A- 1 -( m -J" 1 ) 2 ^(r^ 1 - 



( A r /i (m-j)2-i)2 At l-2-i 
It therefore follows from (1621) and (1631) that 



dr. 



sup 

0<s<t 



A N {s) - 2A- 1 ~( m - j -V 2 3 
2 



< sup 

0<s<t JO 



\W N {r)\ dr 

2 ^-l-(m-i-l)2-^ 



+ sup 

0<s<t JO 



(N^ m -i) 2 ' 3 ) 2 ^- 2 - 3 iV>(™-i)2- J 
2iy(r^( 1 - 2 ^)) 2 + 2\W(rfi~( 1 - 2 ~ 3 1)\Z i (rfi- ( - 1 - 2 ~ J y) 



\W( rf i- {1 - 2 3) )\ dr 
dr 



+ sup 

0<s<t Jo 



+ sup 



N ( N ^-1)2-^2^1-2-3 

2Z4(r / u-( 1 - 2 ^))min{y3(r^-( 1 - 2_J )),y 4 (r/i- (1 - 2 ^ ) )} 
N{N^ m -j) 2 - j ) 2 ^- 2 - j 



dr 



o<s<tJo (N^-i) 2 - 3 ) 2 ^- 2 - 3 



dr. 



(68) 



We need to show that the four terms on the right-hand side of (j68f) each converge in probability 
to zero. Because t is fixed, in each case it suffices to show that the supremum of the integrand 
over r € [0, t] converges in probability to zero as N — > oo. We have 



sup 

0<s<T 



2 j{-l-(m-j -1)2-1 



(JV/i("»-i)2-')2 M l-2-J 

2 

sup 

0<s<T 



N^ m -i) 2 ~ 3 
2 



\W(s)\ 



N ^l+{m-j-l)2-3 ^l+(m-i-l)2- 



\W(s)\ 



Njl {m-j)2- 



by Lemma 1221 because < max{Y"3(s), ^(s)} and the first factor goes to zero as N — ► oo 



by (|49p . Thus, the first term in (|68p converges in probability to zero. Also, Nfi 
N — > oo, so Lemma [22] gives 

^(s) 2 + |Ty(s)|Z 4 (s) 



1-2-3 



oo as 



sup 

0<s<T 



\W(s)\ 



Nfx (m-j)2-J^ N ^1-2-^1/2 J \N^ m -i) 2 - 3 {N^~ 2 - 3 ) 1 / 



\W(s)\+Z 4 (s) 



0: 



which is enough to control the second term in (|68p . The same argument works for the third term, 
using Z 4 (s)Y 4 (s) in the numerator of the left-hand side in place of W(s) 2 + |P^(s)|Z 4 (s). Finally, 



sup 

0<s<T 



M*s(*) 



(N^" 1 -^ 2 ' 3 ) 2 ^ 1 - 2 ' 3 Nfi( m -^ 2 ~ 3 N^+^-i- 1 ) 2 ' 







by Lemma [22] because /x —* as TV -» oo and A/Z+t™--?- 1 ) 2 j is bounded away from zero as 
N —* oo by (|49p. Therefore, the fourth term on the right-hand side of (|68p converges in probability 
to zero, which completes the proof of (|67[) . □ 
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Lemma 27. In both Model 3 and Model 4, the probability that there is a type m — j + 1 mutation 
before time T that has a type m descendant born after time T converges to zero as N — > oo. 

Proof. The same argument works for both models. Let e > 0. By Lemma l2Tj the expected 
number of type m — j + 1 mutations by time T is at most jV/x m ~ jf+1 T m ~ J+1 . Since N/j, l ~ 2 3 — > oo 
as N — ► oo, we have eT -C JV. Therefore, by (fTUj) . the probability that a given mutation stays in 
the population for a time at least eT before dying out or fixating is at most C/(eT). It follows 
that the probability that some type m — j + 1 mutation before time T lasts for a time at least 
eT is at most 

Ce- l Nn m ~i +1 T m -i < C e~ x N ^ m ^ 2 ~ ] ' -» 

as N — » oo by (|49p . Thus, with probability tending to one as N — » oo, all type m — j + 1 
mutations that have a descendant alive at time T originated after time (1 — e)T. 

Arguing as above, the expected number of type m — j + 1 mutations between times (1 — e)T 
and T is at most eNfj, m ~ 3+' i -T m ~^ , and the probability that a given such mutation has a type m 
descendant is qj < Cfi 1 " 2 (J 1 by Proposition [TJ Thus, the probability that some type m — j + 1 
mutation between times (1 — e)T and T has a type m descendant is at most 

CejV/i m - i+1 T m -J+y~ 2 ~ (i_1) < CeNn 1+( - m - j -V 2 ~ j < Ce (69) 

by (|49|) . The lemma follows by letting e — > 0. □ 

Lemma 28. We have lim A r-+ c |r 3 (T) - r 4 (T)| = 0. 

Proof. For i = 3, 4, let Di be the event that no type m — j + 1 mutation that occurs before time 
T has a type m descendant. By Lemma [271 it suffices to show that 

Jim \P(D 3 ) - P(D 4 )\ = 0. (70) 

Recall that Model 3 and Model 4 are coupled so that when a type m — j + 1 mutation occurs 
at the same time in both models, it will have a type m descendant in one model if and only if 
it has a type m descendant in the other. Therefore, \P{D 3 ) — P(D^)\ is at most the probability 
that some type m — j + 1 mutation that occurs in one process but not the other has a type m 
descendant. There are two sources of type m — j + 1 mutations that occur in one process but not 
the other. Some type m — j + 1 mutations are suppressed in one model but not the other because 
there is already an individual of type m — j + 1 or higher in the population. That the probability 
of some such mutation having a type m descendant goes to zero follows from the argument used 
to prove Lemma [25l which is also valid for Model 3 and Model 4. The other type m — j + 1 
mutations that appear in one process but not the other occur when one of the | W(s)| individuals 
that has type m — j in one model but not the other gets a mutation. Let e > 0. By Lemma [26| 
for sufficiently large N, 

p(ms^ T \W{s)\ < eN^ m - j ^~ j ^j > 1 - e. 

Therefore, on an event of probability at least 1 — e, the expected number of type m — j + 1 
mutations that occur in one model but not the other and have a type m descendant is at most 

eN^-Wqj < CeNfj, 1+ ( m -i-V 2 ~ j < Ce 

by Proposition [U and (|49l) . The result follows by letting e — ► 0. □ 
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5.6 Comparing Models 4 and 5 

In both Model 4 and Model 5, type m — j individuals appear at times of a Poisson process whose 
rate at time s is N pJ 11-3 s" 1- ^" 1 / (m — j — 1)!. In both models, type m — j individuals experience 
mutations that will lead to type m descendants at rate fiqj. The two models differ in the following 
three ways: 

• In Model 4, some type m—j + 1 mutations are suppressed because there is another individual 
of type m — j + 1 or higher already in the population. 

• In Model 4, some time elapses between the time of the type m—j + 1 mutation that will 
produce a type m descendant, and the time that the type m — j + 1 descendant appears. 

• In Model 4, when there are k individuals of type m — j and £ individuals of type m — j + 1 
or higher, the rate at which the number of type m — j individuals increases (or decreases) 
by one is fc(./V — £)/N because the number of type m — j individuals changes only when a 
type m — j individual is exchanged with a type individual. This rate is simply k in Model 
5. An additional complication is that the factor (N — £)/N is not independent of whether 
previous type m — j + 1 mutations are successful in producing type m descendants. 

We prove Lemma [29] below by making three modifications to Model 4 to eliminate these 
differences, and then comparing the modified model to Model 5. Lemmas l20l l23l [24"] [2~8l and [29] 
immediately imply part 3 of Proposition [4] 

Lemma 29. We have lim^oc |r 4 (T) - r 5 (T)| = 0. 

Proof. We obtain Model 4' from Model 4 by making the following modifications. First, whenever 
a type m — j + 1 mutation is suppressed in Model 4 because there is another individual in the 
population of type m—j + 1 or higher, in Model 4' we add a type m individual with probability qj. 
Second, whenever a type m—j + 1 mutation occurs in Model 4 that will eventually produce a type 
m descendant, we change the type of the mutated individual in Model 4' to type m immediately. 
Third, for every type m — j + 1 mutation in Model 4', including the events that produce a type 
m individual that were added in the first modification, if there are £ individuals of type m — j or 
higher in the population, then we suppress the mutation with probability £/N. This means that 
at all times, every type m — j individual in Model 4' experiences a mutation that will produce a 
type m descendant at rate fiqj(N — £)/N, while new type m — j individuals appear and disappear 
at rate k(N — £) /N. Note that the number of type m — j individuals is always the same in Model 
4' as in Model 4. Let r^(T) be the probability that there is a type m individual in Model 4' by 
time T. 

Lemma [25l whose proof is also valid for Model 4', implies that with probability tending to 
one as iV — ► oo, the first modification above does not cause a type m individual to be added 
to Model 4' before time T. Lemma [23 implies this same result for the second modification. As 
for the third modification, let e > 0, and let Dn be the event that the number of individuals 
of type m — j or higher in Model 4 stays below eN through time T. By Lemma [22] we have 
liniTv^oo P(Dn) = 1- By Lemma [2TI the expected number of type m — j + 1 mutations by time 
T is at most CNfj, m ' j+1 T m - j+1 . On D N , we always have £/N < e, so the probability that 
Dn occurs and a type m — j + 1 mutation that produces a type m descendant in Model 4 gets 
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suppressed in Model 4' is at most CN^i m 3+ l T m J+1 • qje < Ce, using ([69]) and Proposition [TJ 
Thus, 

limsup |r 4 (T) - r 4 /(T)| < e. (71) 

It remains to compare Model 4' and Model 5. In Model 5, when there are k type m — j 
individuals, the rates that type m — j individuals appear, disappear, and give rise to a type 
m individual are k, k, and k\xq^ respectively, as compared with k(N — £)/N, k(N — £)/N, and 
kfiqj(N — £)/N respectively in Model 4'. Consequently, Model 4' is equivalent to Model 5 slowed 
down by a factor of {N — £)/N, which on Djy stays between 1 — e and 1. We can obtain a 
lower bound for r^(T) by considering Model 5 run all the way to time T, so r^(T) > r^(T). An 
upper bound for r 4 /(T) on Dn is obtained by considering Model 5 run only to time T(l — e), so 
r' 4 (T) < r 5 ((l - e)T) + P(D C N ). Now limjv^oc r 5 ((l - e)T) is given by the right-hand side of (J32J) 
with (1 — e)t in place of t. Therefore, by letting N — > oo and then e — ► 0, we get 

lim \r 4 ,(T)-r 5 (T)\=0, 

N^xx 

which, combined with (|7ip . proves the lemma. □ 
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